NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Algorithm and Hardware Co-design for Deep Learning-powered Channel Decoder: A Case Study

https://doi.org/10.1109/ICCAD51958.2021.9643510

Zhang, Boyang; Sui, Yang; Huang, Lingyi; Liao, Siyu; Deng, Chunhua; Yuan, Bo (November 2021, 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD))

Full Text Available
PERMCNN: Energy-efficient Convolutional Neural Network Hardware Architecture with Permuted Diagonal Structure

https://doi.org/10.1109/TC.2020.2981068

Deng, Chunhua; Liao, Siyu; Yuan, Bo (March 2020, IEEE Transactions on Computers)

Full Text Available
Low-complexity Neural Network-based MIMO Detector using Permuted Diagonal Matrix

Liao, Siyu; Deng, Chunhua; Liu, Lingjia; Yuan, Bo (November 2019, Conference record - Asilomar Conference on Circuits, Systems, and Computers)

Full Text Available
Structured Neural Network with Low Complexity for MIMO Detection

https://doi.org/10.1109/SiPS47522.2019.9020365

Liao, Siyu; Deng, Chunhua; Liu, Lingjia; Yuan, Bo (October 2019, IEEE International Workshop on Signal Processing Systems (SiPS))

Full Text Available
Compressing Deep Neural Networks Using Toeplitz Matrix: Algorithm Design and Fpga Implementation

https://doi.org/10.1109/ICASSP.2019.8683556

Liao, Siyu; Samiee, Ashkan; Deng, Chunhua; Bai, Yu; Yuan, Bo (May 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Full Text Available
PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

https://doi.org/10.1109/MICRO.2018.00024

Deng, Chunhua; Liao, Siyu; Xie, Yi; Parhi, Keshab K.; Qian, Xuehai; Yuan, Bo (October 2018, Proc. 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO))

Deep neural network (DNN) has emerged as the most important and popular artificial intelligent (AI) technique. The growth of model size poses a key energy efficiency challenge for the underlying computing platform. Thus, model compression becomes a crucial problem. However, the current approaches are limited by various drawbacks. Specifically, network sparsification approach suffers from irregularity, heuristic nature and large indexing overhead. On the other hand, the recent structured matrix-based approach (i.e., CIRCNN) is limited by the relatively complex arithmetic computation (i.e., FFT), less flexible compression ratio, and its inability to fully utilize input sparsity. To address these drawbacks, this paper proposes PERMDNN, a novel approach to generate and execute hardware-friendly structured sparse DNN models using permuted diagonal matrices. Compared with unstructured sparsification approach, PERMDNN eliminates the drawbacks of indexing overhead, nonheuristic compression effects and time-consuming retraining. Compared with circulant structure-imposing approach, PERMDNN enjoys the benefits of higher reduction in computational complexity, flexible compression ratio, simple arithmetic computation and full utilization of input sparsity. We propose PERMDNN architecture, a multi-processing element (PE) fully connected (FC) layer-targeted computing engine. The entire architecture is highly scalable and flexible, and hence it can support the needs of different applications with different model configurations. We implement a 32-PE design using CMOS 28nm technology. Compared with EIE, PERMDNN achieves 3:3x-4:8x higher throughout, 5:9x-8:5x better area efficiency and 2:8x-4:0x better energy efficiency on different workloads. Compared with CIRCNN, PERMDNN achieves 11:51x higher throughput and 3:89x better energy efficiency.
more » « less
Full Text Available
Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices

https://doi.org/10.1109/ICCAD.2017.8203813

Liao, Siyu; Li, Zhe; Lin, Xue; Qiu, Qinru; Wang, Yanzhi; Yuan, Bo (November 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD))

Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ~ 116X with negligible accuracy degradation.
more » « less
Full Text Available
Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Wang, Yanzhi; Ding, Caiwen; Li, Zhe; Yuan, Geng; Liao, Siyu; Ma, Xiaolong; Yuan, Bo; Qian, Xuehai; Tang, Jian; Qiu, Qinru; et al (February 2018, The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18))

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff of accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n2 ) to O(n log n) and storage complexity from O(n2) to O(n), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.
more » « less
Full Text Available
Towards ultra-high performance and energy efficiency of deep learning systems: an algorithm-hardware co-optimization framework

Wang, Yanzhi; Ding, Caiwen; Li, Zhe; Yuan, Geng; Liao, Siyu; Ma, Xiaolong; Yuan, Bo; Qian, Xuehai; Tang, Jian; Qiu, Qinru; et al (February 2018, AAAI'2018)

Full Text Available
Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Wang, Yanzhi; Ding, Caiwen; Li, Zhe; Yuan, Geng; Liao, Siyu; Ma, Xiaolong; Yuan, Bo; Qian, Xuehai; Tang, Jian; Qiu, Qinru; et al (February 2018, The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18))

Full Text Available

« Prev Next »

Search for: All records